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RELATED APPLICATIONS 
This appjication is a division of Provisional Application Ser. No. 60/174,360 filed 
\ Jan. 4, 2000, incorporated herein by reference. 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention. 
The present invention relates to processing of compressed audio/visual data, and 

more particularly to splicing of streams of audio/visual data. 

2. Background Art. 

It has become common practice to compress audio/visual data in order to reduce 
the capacity and bandwidth requirements for storage and transmission. One of the most 
popular audio/video compression techniques is MPEG. MPEG is an acronym for the 
Moving Picture Experts Group, which was set up by the International Standards 
Organization (ISO) to work on compression. MPEG provides a number of different 
variations (MPEG-1, MPEG-2, etc.) to suit different bandwidth and quality constraints. 
MPEG-2, for example, is especially suited to the storage and transmission of broadcast 
quality television programs. 

For the video data, MPEG provides a high degree of compression (up to 200:1) by 
encoding 8x8 blocks of pixels into a set of discrete cosine transform (DCT) 
coefficients, quantizing and encoding the coefficients, and using motion compensation 
techniques to encode most video frames as predictions from or between other frames. In 
particular, the encoded MPEG video stream is comprised of a series of groups of pictures 
(GOPs), and each GOP begins with an independently encoded (intra) I frame and may 
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1 include one or more following P-frames and B-frames. Each I frame can be decoded 

2 without information from any preceding and/or following frame. Decoding of a P frame 

3 requires information from a preceding frame in the GOP. Decoding of a B frame requires 

4 information from a preceding and following frame in the GOP. To minimize decoder 

5 buffer requirements, each B frame is transmitted in reverse of its presentation order, so 

6 that all the information of the other frames required for decoding the B frame will arrive 

7 at the decoder before the B frame. 

8 In addition to the motion compensation techniques for video compression, the 

9 MPEG standard provides a generic framework for combining one or more elementary 

10 streams of digital video and audio, as well as system data, into single or multiple program 

1 1 transport streams (TS) which are suitable for storage or transmission. The system data 

12 includes information about synchronization, random access, management of buffers to 

13 prevent overflow and underflow, and time stamps for video frames and audio packetized 

14 elementary stream packets. The standard specifies the organization of the elementary 

15 streams and the transport streams, and imposes constraints to enable synchronized 

16 decoding from the audio and video decoding buffers under various conditions. 

17 The MPEG 2 standard is documented in ISO/IEC International Standard (IS) 

18 13818-1, "Information Technology-Generic Coding of Moving Pictures and Associated 

19 Audio Information: Systems," ISO/IEC IS 13818-2, "Information Technology-Generic 

20 Coding of Moving Pictures and Associated Information: Video," and ISO/IEC IS 13818- 

21 3, "Information Technology-Generic Coding of Moving Pictures and Associated Audio 

22 Information: Audio," incorporated herein by reference. A concise introduction to MPEG 
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1 is given in "A guide to MPEG Fundamentals and Protocol Analysis (Including DVB and 

2 ATSC)," Tektronix Inc., 1997, incorporated herein by reference. 

3 Splicing of audio/visual programs is a common operation performed, for example, 

4 whenever one encoded television program is switched to another. Splicing may be done 

5 for commercial insertion, studio routing, camera switching, and program editing. The 

6 splicing of MPEG encoded audio/visual streams, however, is considerably more difficult 

7 than splicing of the uncompressed audio and video. The P and B frames cannot be 

8 decoded without a preceding I frame, so that cutting into a stream after an I frame renders 

9 the P and B frames meaningless. The P and B frames are considerably smaller than the I 

10 frames, so that the frame boundaries are not evenly spaced and must be dynamically 

1 1 synchronized between the two streams at the time of the splice. Moreover, because a 

12 video decoder buffer is required to compensate for the uneven spacing of the frame 

13 boundaries in the encoded streams, splicing may cause underflow or overflow of the 

14 video decoder buffer. 

15 The problems of splicing MPEG encoded audio/visual streams are addressed to 

16 some extent in Appendix K, entitled "Splicing Transport Streams," to the MPEG-2 

17 standard ISO/IEC 13818-1 1996. Appendix K recognizes that a splice can be "seamless" 

18 when it does not result in a decoding discontinuity, or a splice can be "non-seamless" 

19 when it results in a decoding discontinuity. In either case, however, it is possible that the 

20 spliced stream will cause buffer overflow. 

21 The Society of Motion Picture and Television Engineers (SMPTE) apparently 

22 thought that the ISO MPEG-2 standard was inadequate with respect to splicing. They 

23 promulgated their own SMPTE Standard 3 1 2M, entitled "Splice Points for MPEG-2 
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1 Transport Streams," incorporated herein by reference. The SMPTE standard defines 

2 constraints on the encoding of and syntax for MPEG-2 transport streams such that they 

3 may be spliced without modifying the packetized elementary stream (PES) packet 

4 payload. The SMPTE standard includes some constraints applicable to both seamless 

5 and non-seamless splicing, and other constraints that are applicable only to seamless 

6 splicing. For example, for seamless and non-seamless splicing, a splice occurs from an 

7 Out Point on a first stream to an In Point on a second stream. The Out Point is 

8 immediately after an I frame or P frame (in presentation order). The In Point is just 

9 before a sequence header and I frame in a "closed" GOP (i.e., no prediction is allowed 

10 back before the In Point). 

1 1 As further discussed in Norm Hurst and Katie Cornog, "MPEG Splicing: A New 

12 Standard for Television - SMPTE 3 12M," SMPTE Journal, Nov. 1998, there are two 

13 buffering constraints for seamless splicing. The startup delay at the In Point must be a 

14 particular value, and the ending delay at the Out Point must be one frame less than that. 

15 Also, the old stream must be constructed so that the video decoder buffer (VBV buffer) 

16 would not overflow if the bit rate were suddenly increased to a maximum splice rate for a 

17 period of a splice decoding delay before each Out Point. 
18 

1 9 SUMMARY OF THE INVENTION 

20 In accordance with a first aspect, the invention provides a method of 

21 preparing metadata for splicing of a transport stream. The transport stream includes 

22 video access units encoding video presentation units representing video frames. The 

23 video access units of the transport stream encode the video presentation units using a data 
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1 compression technique and contain a variable amount of compressed video data. The 

2 method includes a file server ingesting the transport stream, and storing the transport 

3 stream in a file in data storage. Concurrently with storing the transport stream in the file 

4 in data storage, the file server computes metadata for splicing of the transport stream, and 

5 stores the metadata for splicing in the file. 

6 In accordance with another aspect, the invention provides a data storage device 

7 containing a file of data of a transport stream including video access units encoding video 

8 presentation units representing video frames. The video access units of the transport 

9 stream encode the video presentation units using a data compression technique and 

10 contain a variable amount of compressed video data. The file also contains an index to 

1 1 groups of pictures (GOPs) in the transport stream. The index to the groups of pictures 

12 includes pointers to transport stream file data of respective ones of the GOPs. The file 

13 further contains attributes of the GOPs computed from the data of the transport stream. 

14 The attributes of the GOPs are also indexed by the index to the groups of pictures. 
15 

16 BRIEF DESCRIPTION OF THE DRAWINGS 

17 Other objects and advantages of the invention will become apparent upon reading 

18 the following detailed description with reference to the accompanying drawings, in 

19 which: 

20 FIG. 1 is a block diagram of a video file server; 

21 FIG. 2 is a perspective view showing the use of a set-top decoder box; 

22 FIG. 3 is a block diagram showing a switch for splicing broadcast audio/visual 

23 streams; 
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FIG. 4 is a block diagram of an MPEG decoder; 




2 


FIG. 5 is a diagram of the format of an MPEG transport packet stream; 




3 


rlu. o is a diagram or the rormat or an MrbG rhb packet; 




4 


FIG. 7 is a diagram showing audio and video content in two MPEG transport 




5 


streams to be spliced; 




6 


FIG. 8 is a diagram showing aligned elementary video and audio streams resulting 




7 


from the splicing of the two MPEG transport streams in FIG. 7; 




8 


FIG. 9 is a diagram showing that audio access units are not aligned on audio PES 




9 


packet boundaries; 


t.fs 


10 


FIG. 10 is a logic table showing eight cases for the selection of audio presentation 
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11 


units to be included in the splicing of two MPEG transport streams; 


w 


12 


T~> T y"t 1 1 A I " 1 • , , * _1 1 1* a a' ' a. 

FIG. 1 1 A is a diagram showing content of video and audio presentation unit 


•£* 


13 


streams for the two MPEG transport streams for a first case in the logic table of FIG. 10; 




14 


FIG. I IB is a diagram showing the content of video and audio presentation unit 




15 


streams resulting from a first possible splicing of the two MPEG transport streams shown 


t=JS? 
£=*=. 

^aJ 


16 


in FIG. II A; 




17 


FIG. 1 1C is a diagram showing the content of video and audio presentation unit 




18 


streams resulting from a second possible splicing of the two MPEG transport streams 




19 


1 * T'T /"""^ 1 1 A 

shown in FIG. II A; 




20 


FIG. 12A is a diagram showing content of video and audio presentation unit 




21 


streams for the two MPEG transport streams for a second case in the logic table of FIG. 




22 


10; 
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FIG. 12B is a diagram showing the content of video and audio presentation unit 
streams resulting from splicing of the two MPEG transport streams shown in FIG. 1 2A; 

FIG. 13A is a diagram showing content of video and audio presentation unit 
streams for the two MPEG transport streams for a third case in the logic table of FIG. 10; 

FIG. 13B is a diagram showing the content of video and audio presentation unit 
streams resulting from splicing of the two MPEG transport streams shown in FIG. 1 3 A; 

FIG. 14A is a diagram showing content of video and audio presentation unit 
streams for the two MPEG transport streams for a fourth case in the logic table of FIG. 
10; 

FIG. 14B is a diagram showing the content of video and audio presentation unit 
streams resulting from splicing of the two MPEG transport streams shown in FIG. 14 A; 

FIG. 15A is a diagram showing content of video and audio presentation unit 
streams for the two MPEG transport streams for a fifth case in the logic table of FIG. 10; 

FIG. 1 5B is a diagram showing the content of video and audio presentation unit 
streams resulting from splicing of the two MPEG transport streams shown in FIG. 1 5 A; 

FIG. 16A is a diagram showing content of video and audio presentation unit 
streams for the two MPEG transport streams for a sixth case in the logic table of FIG. 1 0; 

FIG. 16B is a diagram showing the content of video and audio presentation unit 
streams resulting from splicing of the two MPEG transport streams shown in FIG. 16A; 

FIG. 17A is a diagram showing content of video and audio presentation unit 
streams for the two MPEG transport streams for a seventh case in the logic table of FIG. 
10; 
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1 FIG. 17B is a diagram showing the content of video and audio presentation unit 

2 streams resulting from a first possible splicing of the two MPEG transport streams shown 

3 in FIG. 17A; 

4 FIG. 17C is a diagram showing the content of video and audio presentation unit 

5 streams resulting from a second possible splicing of the two MPEG transport streams 

6 shown in FIG. 17A; 

7 FIG. 1 8 A is a diagram showing content of video and audio presentation unit 

8 streams for the two MPEG transport streams for an eighth case in the logic table of FIG. 

9 10; 

10 FIG. 18B is a diagram showing the content of video and audio presentation unit 

1 1 streams resulting from splicing of the two MPEG transport streams shown in FIG. 1 8A; 

12 FIG. 19 is a flow chart of a procedure for splicing MPEG clips; 

13 FIG. 20A is a graph of video buffer level versus time for decoding the end of a 

14 first MPEG clip; 

^7 15 FIG. 20B is a graph of video buffer level versus time for decoding the beginning 

16 of a second MPEG clip; 

17 FIG. 21 is a graph of video buffer level versus time for decoding of a seamless 

18 splicing of the first MPEG clip to the second MPEG clip; 

19 FIG. 22 is a flow chart of a basic procedure for seamless splicing of video 

20 streams; 

21 FIG. 23 is a first portion of a flow chart of a procedure for splicing video streams; 

22 FIG. 24 is a second portion of the flow chart begun in FIG. 23; 

23 FIG. 25 is a first portion of a flow chart of a procedure for splicing audio streams; 
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1 FIG. 26 is a second portion of the flow chart begun in FIG. 25; 

2 FIG. 27 is a logic table showing how the first and second clips for the cases of 

3 FIGS. 1 1 A to 1 8 A should be spliced when the second clip has a high or low mean audio 

4 buffer level close to overflowing or underflowing respectively; 

5 FIG. 28 shows how the first and second clips for the case of FIG. 1 1 A should be 

6 spliced when the second clip has a high mean audio buffer level; 

7 FIG. 29 shows how the first and second clips for the case of FIG. 12A should be 

8 spliced when the second clip has a low mean audio buffer level; 

9 FIG. 30 shows how the first and second clips for the case of FIG. 13A should be 
% 10 spliced when the second clip has a low mean audio buffer level; 

m 

11 FIG. 3 1 shows how the first and second clips for the case of FIG. 14A should be 

SE3S. 

Ly 12 spliced when the second clip has a high mean audio buffer level; 

m 13 FIG. 32 shows how the first and second clips for the case of FIG. 15A should be 

t= 

H 1 14 spliced when the second clip has a low mean audio buffer level; 

ff 15 FIG. 33 shows how the first and second clips for the case of FIG. 16A should be 

S 16 spliced when the second clip has a high mean audio buffer level; 

17 FIG. 34 shows how the first and second clips for the case of FIG. 17A should be 

18 spliced when the second clip has a low mean audio buffer level; 

19 FIG. 35 shows how the first and second clips for the case of FIG. 18A should be 

20 spliced when the second clip has a high mean audio buffer level; 

21 FIG. 36 is a schematic diagram of a digital filter for estimating the average audio 

22 buffer level and standard deviation of the audio buffer level from presentation time 
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1 stamps (PTS) and extrapolated program clock reference (PCR) time stamps for an audio 

2 elementary stream; 

3 FIG. 37 is a schematic diagram of circuitry for computing an expected maximum 

4 and an expected minimum audio buffer level from the estimated average audio buffer 

5 level and standard deviation of the average audio buffer level from the digital filter 

6 circuitry in FIG. 36; 

7 FIG. 38 is a flow chart of a procedure for computing an offset for the video 

8 decode time stamps (DTS) of the second clip for splicing the second clip onto the first 

9 clip; 

10 FIG. 39 is a flow chart of a procedure for computing an offset for the audio 

1 1 presentation time stamps (PTS) of the second clip for splicing the second clip onto the 

12 first clip; 

13 FIG. 40 is a flow chart of a procedure for computing an offset for the program 

14 clock reference (PCR) time stamps of the second clip for splicing the second clip to the 

15 first clip; 

16 FIG. 41 is a flow chart of a procedure for re-stamping a second clip for splicing of 

1 7 the second clip to the first clip; 

18 FIG. 42 is a diagram of macroblocks in a video frame; 

19 FIG. 43 is a diagram showing non-obsolete audio packets in a first TS stream 

20 following the end of video at an Out Point and null packets and obsolete audio packets in 

21 a second TS stream following the beginning of video at an In Point; 

22 FIG. 44 is a flow chart of a re-formatting procedure that replaces the null packets 

23 and obsolete audio packets in FIG. 43 with the non-obsolete audio packets in FIG. 43; 
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1 FIG. 45 is a diagram showing MPEG Transport Stream (TS) metadata 

2 computation and storage of the metadata in the header of an MPEG TS data file; 

3 FIG. 46 is a block diagram of the preferred format of a GOP index introduced in 

4 FIG. 45; 

5 FIG. 47 is a flow chart showing decimation of the GOP index; 

6 FIG. 48 is a flow chart showing metadata computations for a next GOP in an 

7 ingested TS; 

8 FIG. 49 is a block diagram of various blocks in the stream server computer of the 

9 video file server of FIG. 1 for computing MPEG metadata during ingestion of an MPEG 

10 TS, and for performing real-time MPEG processing such as seamless splicing in real-time 

1 1 during real-time transmission of a spliced MPEG TS; 

12 FIG. 50 is a diagram showing flow of control during a metered file transfer using 

1 3 the video server of FIG. 1 ; 

14 FIG. 51 is a block diagram of play lists in the video file server of FIG. 1, showing 

15 that a stream server play list is maintained as a window into a control station play list; 

16 and 

17 FIG. 52 is a flow chart showing the use of seamless splicing for repair of a 

1 8 temporarily corrupted TS. 

19 While the invention is susceptible to various modifications and alternative forms, 

20 specific embodiments thereof have been shown in the drawings and will be described in 

21 detail. It should be understood, however, that it is not intended to limit the form of the 

22 invention to the particular forms shown, but on the contrary, the intention is to cover all 
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1 modifications, equivalents, and alternatives falling within the scope of the invention as 

2 defined by the appended claims. 

3 

4 DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

5 Turning now to FIG. 1 of the drawings, there is shown a video file server 

6 generally designated 20 which may use the present invention. The video file server 20 

7 includes an array of stream servers 21, at least one control server 28, 29, a cached disk 

8 array storage subsystem 23, and an optional tape silo 24. The video file server 20 is a 

9 high performance, high capacity, and high-availability network-attached data server. It 

10 provides the ability for multiple file systems to exist concurrently over multiple 

m 

jf 1 1 communication stacks, with shared data access. It also allows multiple physical file 

y 12 systems to co-exist, each optimized to the needs of a particular data service. 

P 13 The video file server 20 is managed as a dedicated network appliance, integrated 

E3 14 with popular network operating systems in a way, which, other than its superior 

fj 15 performance, is transparent to the end user. It provides specialized support for real-time 

S£ 16 data streams used in live, as well as store-and-forward, audio-visual applications. 

17 Therefore, the video file server 20 is suitable for a wide variety of applications such as 

18 image repositories, video on demand, and networked video applications, in addition to 

19 high-end file server applications such as the Network File System (NFS, version 2 and 

20 version 3) (and/or other access protocols), network or on-line backup, fast download, etc. 

21 The clustering of the stream servers 21 as a front end to the cached disk array 23 

22 provides parallelism and scalability. The clustering of random-access memory in the 

23 stream servers 21 provides a large capacity cache memory for video applications. 
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Each of the stream servers 21 is a high-end commodity computer, providing the 
highest performance appropriate for a stream server at the lowest cost. The stream 
servers 21 are mounted in a standard 19" wide rack. Each of the stream servers 21, for 
example, includes and Intel processor connected to an EISA or PCI bus and at least 64 
MB of random-access memory. The number of the stream servers 21 , their processor 
class (i486, Pentium, etc.) and the amount of random-access memory in each of the 
stream servers, are selected for desired performance and capacity characteristics, such as 
the number of concurrent users to be serviced, the number of independent multi-media 
programs to be accessed concurrently, and the desired latency of access to the multi- 
media programs. 

Each of the stream servers 21 contains one or more high-performance FWD (fast, 
wide, differential) SCSI connections to the back-end storage array. Each of the stream 
servers 21 may also contain one or more SCSI connections to the optional tape silo 24. 
Each of the stream servers 21 also contains one or more outbound network attachments 
configured on the stream server's EISA or PCI bus. The outbound network attachments, 
for example, are Ethernet, FDDI, ATM, DS1, DS3, or channelized T3 attachments to data 
links to a network 25. Each of the stream servers 21 also includes an additional Ethernet 
connection to a dual redundant internal Ethernet link 26 for coordination of the stream 
servers with each other and with one or more controller servers 28, 29. 

The controller servers 28, 29 are dual redundant computers 28, 29, each of which 
is similar to each of the stream servers 21 . Each of the dual redundant controller servers 
28, 29 has a network attachment to a bidirectional link 30 in the network 25, through 
which each of the controller servers 28, 29 can conduct service protocols. The service 
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1 


protocols include one or more standard management and control protocols such as the 




2 


Simple Network Management Protocol (SNMP), and at least one Continuous Media File 




3 


Access Protocol supporting real-time multi-media data transmission from the stream 




4 


servers 21 to the network 25. 




5 


Each of the dual redundant controller servers 28, 29 has an Ethernet connection to 




6 


the local Ethernet link 26. Each of the controller servers 28, 29 also has a connection to a 




7 


serial link 31 to a media server display and keyboard 32. The controller servers 28, 29 




8 


run a conventional operating system (such as Windows NT or UNIX) to provide a hot- 




9 


failover redundant configuration. An active one of the dual redundant controller servers 


!=*! 


10 


28, 29 functions as a media server controller for the video file server 20. The active one 


Iff 


11 


of the controller servers 28, 29 also allows management and control of the server 


i s 


12 


resources from the network using standard protocols, such as the Simple Network 


;=«. 

=: £: h 
■=f = 




Manaapmpnt Prntorol fSTNTMTM The active one of the controller servers 28 29 mav also 
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w 


14 


provide lock management if lock management is not provided by the cached disk array 




15 


23. 


D 


16 


For multi-media data transfer, the active one of the controller servers 28, 29 




17 


assigns one of the stream servers 21 to the network client 54 requesting multi -media 




18 


service. The network 25, for example, has conventional transmission components 53 




19 


such as routers or ATM switches that permit any one of the clients 54 to communicate 




20 


with any one of the stream servers 21 . The active one of the controller servers 28, 29 




21 


could assign a stream server to a network client by a protocol sending to the client the 




22 


network address of the stream server assigned to send or receive data to or from the 




23 


client. Alternatively, the active one of the controller servers 28, 29 could communicate 
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1 with a router or switch in the transmission components 53 to establish a data link between 

2 the client and the stream server assigned to the client. 

3 The cached disk array 23 is configured for an open systems network environment. 

4 The cached disk array 23 includes a large capacity semiconductor cache memory 41 and 

5 SCSI adapters 45 providing one or more FWD SCSI links to each of the stream servers 

6 21 and to each of the dual redundant controller servers 28, 29. The disk array 47 may 

7 store data using mirroring or other RAID (redundant array of inexpensive disks) 

8 techniques to recover from single disk failure. Although simple mirroring requires more 

9 storage disks than the more complex RAID techniques, it has been found very useful for 

10 increasing read access bandwidth by a factor of two by simultaneously accessing each of 

1 1 two mirrored copies of a video data set. Preferably, the cached disk array 23 is a 

12 Symmetrix 5500 (Trademark) cached disk array manufactured by EMC Corporation, 171 

13 South Street, Hopkinton, Mass., 01748-9103. 

14 The tape silo 24 includes an array of SCSI adapters 50 and an array of read/write 

1 5 stations 5 1 . Each of the read/write stations 5 1 is connected via a respective one of the 

16 SCSI adapters 50 and a FWD SCSI link to a respective one of the stream servers 21 or 

17 each of the redundant controller servers 28, 29. The read/write stations 51 are controlled 

18 robotically in response to commands from the active one of the controller servers 28, 29 

19 for tape transport functions, and preferably also for mounting and unmounting of tape 

20 cartridges into the read/write stations from storage bins. 

21 Further details regarding the structure and operation of the video file server 20 are 

22 found in Wayne Duso and John Forecast, "System Having Client Sending Edit 

23 Commands to Server During Transmission of Continuous Media from One Clip in Play 
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audio elementary streams syncnronizeo to tne viaeo elementary stream, oy using me 




6 


splicing techniques as described below, it is possible for the video file server to make a 
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seamless transition to a second cup irom an intermediate lucatiun 111 a nibt enp uuxmg 
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real-time auaio/viaeo aata transmission irom tne viaeo me server zu to one 01 me clients 
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j4. in tnis regara, ior tne purposes 01 interpreting tne appended claims, seamiess 
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10 


splicing snouid oe understood to mean a process mat win produce a spucea transport 
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1 1 


stream, tne play-out 01 wnicn is suostantiany iree rrom any audio-visuai artiract tnat tne 


i'gj 


12 


human auditory and visual system can detect. 


y~ 


13 


Witn reierence to rio. z, mere is snown anotner application ior seamless splicing 




i 

14 


or JViriiLr transport streams, in tnis application, a set-top decoder dox 01 receives a 
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15 


numoer 01 ivirt/Lr transport streams irom a coaxial caoie oz. i^acn 01 tne iviriio 


ri ; 


1 /T 

16 


transport streams encodes audio and video lniormation ior a respective television 




17 


cnannei. a viewer ^not snownj may operate a remote control oj to select one 01 tne 




1 O 


cnannei s ior viewing on a television oh. ine uecuuer uux oi ociccid uic ivif e>vj uoiibpuii 




i n 

19 


stream ior tne desired cnannei and decodes tne transport stream to provide a conventional 




20 


audio/ visual signal ^sucn as an in i ov^ composite analog audio/ video signal j to tne 




21 


television set. 




22 


In the set-top application as shown in FIG. 2, a problem arises when the viewer 




23 


rapidly scans through the channels available from the decoder 61 . If a simple 
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demultiplexer is usea to swucn irom one ivix cu transport bircdm tu diiuuicr lrum mc 
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caoie oz, a consiaeraoie iimc win Dc required ior tne decoder to adapt iu me cuiilcai ui 




3 


the new stream. During this adaptation process, undesirable audio and video 






discontinuities may result, unc attempt to soive tms discontinuity prouicrn is to rcbei uic 




5 


decoder, squeicn xne audio, ana ireeze xne viaeo ior a certain amount 01 time aner 
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3 wi idling irum one ivir rLvJ tiaiiopurt oircain lu diiuiiicr. nuwever, uiio appruavii win 
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siow down tne maximum raxe at wnicn tne viewer can scan tnrougn tne cnanneis wnue 




o 
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looking for an interesting program to watch. 
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A preferred solution is to incorporate an MPEG transport stream splicer into the 


1 


1 A 

10 


set-top decoder dox. ine iviriio splicer would oe programmea to periorm a seamiess 


=== 


1 1 


splicing procedure as will be described further below with reference to FIG. 7 et seq. The 




12 


MPEG splicer would seamlessly splice from an MPEG transport stream currently viewed 




13 


to a seiectea new Mr rj\j transport stream to produce an encoded ivir nu transport stream 


UJ 


14 


that would be decoded in the conventional fashion without significant audio/visual 


L.JL, 


15 


discontinuities and witnout a sigmiicant deiay . i ne lviriiva splicer in tne set-top decoder 


£=^ 


16 


box would be similar to the MPEG splicer shown in FIG. 3 . 




17 


FIG. 3 shows a switch 70 for seamless switching between MPEG transport 




18 


streams in a broadcast environment. The switch 70 receives MPEG transport streams 




i a 

iy 


irom a variety 01 sources, sucn as a satellite disn receiver / 1, servers /z 5 /j, /h-, ana a 






studio video camera / d and an ivirc,o encoder /o. /\ conventional metnou 01 seamiess 




21 


switching between MPEG transport streams in a broadcast environment is to decode each 




22 


transport stream into a respective series of video frames and one or more corresponding 




23 


audio signals, switch between the video frames and corresponding audio signals for one 
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transport stream ana tne viaeo irames ana corresponaing auaio signals ior dnomer 
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transport stream, ana re-encoae tne viaeo irames ana auaio signdib 10 prouuce me spiicea 
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lviriiO transport stream, riowever, ine computational dnu sioiagc icauuicco iiccucu iui 
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decoding tne Mrcu transport streams ana encoaing tne spncea viaeo irames <ma duuio 




5 


signals can oe avoiaea using tne seamless splicing proceaure aescnoea oeiow. 
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in tne switcn /u, a ae-muitipiexer / / switcnes irom a curreni ivirri/Vj transport 




7 


stream to a new MPEG transport stream. The MPEG transport stream selected by the 




8 


multiplexer / / is received oy an Mri^Lr splicer /o, wnicn penorms seamless spncing as 




9 


aescnoea oeiow. ine jvirr^o splicer /o inciuaes a central processor unit u ^ dnu 


. J*s 


10 


random access memory ou. i ne ranaom access memory proviaes ounering 01 tne ivir cvj 


m 


1 1 


transport stream selected oy tne multiplexer / / so tnat at tne time 01 spncing, tne spncer 


%M 


i ^ 
12 


/o will nave in tne memory oU a portion 01 tne current ivirri/O transport stream near me 




13 


splice point, and a beginning portion of the new MPEG transport stream. The splicer 78 


tsar 

w 


14 


outputs a spliced MPEG transport stream that can be transmitted to customers, for 




15 


example, from a broadcast antenna 8 1 . 


i=te. 


16 


witn reierence to ri^j. ^f, mere is snown a diock aiagram oi an ivirxio aecoaer. 




1 "7 

17 


i ne decoder includes a demultiplexer yu, wnicn receives a transport stream ^ i o) oi 




18 


packets. The demultiplexer extracts a stream of video packetized elementary stream (V- 




19 


PES) packets, and two streams of audio packetized elementary stream (A-PES) packets. 




20 


A video Duller y 1 receives tne stream oi v -rt/o packets, a nrst auaio ouner yz receives 




21 


the first stream of A-PES packets, and a second audio buffer 93 receives the second 




22 


stream of A-PES packets. A video decoder 94 receives the V-PES packets from the 




23 


video buffer 91 and produces video presentation units (VPUs). Each VPU, for example, 
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1 includes digital data specifying the color and intensity of each pixel in a video frame. A 

2 first audio decoder 95 receives A-PES packets from the first audio buffer 92 and 

3 produces audio presentation units (APUs) for a first audio channel. An audio 

4 presentation unit, for example, includes digital data specifying a series of audio samples 

5 over an interval of time. A second audio decoder 96 receives A-PES packets from the 

6 second audio buffer 93 and produces APUs for a second audio channel. The first and 

7 second channels, for example, are right and left stereo audio channels. 

8 For seamless splicing of MPEG transport streams, it is not necessary to decode 

9 the video and audio elementary streams down to the presentation unit level, nor is it 

10 necessary to simulate the video and audio buffers. Instead, the transport stream need only 

1 1 be parsed down to the level of the packetized elementary streams and access units, and 

12 the video and audio buffers need be considered only to the extent of avoiding buffer 

13 overflow or underflow. As will be described below, buffer overflow or underflow can be 

14 avoided by estimating buffer level based on program clock reference (PCR) and decode 

15 time stamp (DTS) values. Seamless splicing can be done independently of the method of 

16 audio encoding, although the estimation of buffer level can be made more precise by 

17 taking into consideration certain encoded data statistics, which happen to be dependent 

18 on the type of audio encoding. It is desired to provide a generic splicing method in which 

19 no constraining assumptions are made about various encoding parameters such as frame 

20 rate, audio bit rate, and audio sampling frequency. It is also desired to achieve splicing 

21 directly on the transport streams with as little complexity as possible. 

22 FIG. 5 is a diagram showing the syntax of the MPEG-2 Transport Stream. This 

23 diagram is a relevant portion of Figure F.l of Annex F of the MPEG-2 standards 
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1 document ISO/IEC 13818-1 . The MPEG-2 Transport Stream is comprised of a series of 

2 1 88 byte TS packets, each of which may include video, audio, or control information. 

3 Seamless splicing, as described below, may involve modification of the payload unit start 

4 indicator, the packet identifier (PID), the continuity counter field, the adaptation field 

5 length in the adaptation field, and the program counter (PCR) time stamp again provided 

6 in the adaptation field. If the data of a video PES packet or audio PES packet starts in the 

7 payload of a TS packet, then the payload unit start indicator bit is set to a one. 

8 Otherwise, if the TS packet contains the continuation of an already initiated audio or 

9 video PES packet, then the payload unit start indicator bit is set to zero. Very typically 
^ 10 the payload unit start indicator will be changed by setting it to one at the first TS packet 
J 1 1 of the audio for the second stream in the spliced Transport Stream. The original 

" i 

y 12 continuity counter values of the second stream are modified so that the continuity counter 

O 

p 13 values in the spliced TS have consecutive values. The adaptation field length in the 

O 14 adaptation fields of the last audio TS packet in the first stream and also the first audio TS 

'ff 15 packet in the second stream within the spliced TS will typically need to be modified 

2: 16 during splicing in order to insert some stuffing bytes to generate full 188 byte sized valid 

17 transport packets. The original PCR values from the second stream are uniformly 

1 8 incremented in the spliced TS. 

19 FIG. 6 is a diagram showing the syntax of an MPEG-2 PES packet. This diagram 

20 is a relevant portion of Figure F.2 of Annex F of the MPEG-2 standards document 

2 1 ISO/IEC 13818-1. The MPEG-2 PES packet may include video, audio, or control 

22 information. Seamless splicing, as described below, may involve modification of the 

23 PES packet length, and the data alignment indicator and presentation time stamp (PTS) 
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and decode time stamp (DTS) in the PES header. During splicing, the PES packet length 
typically has to be modified for the audio, in two places. The first is the last audio PES 
packet of the first stream, where the information about the size often has to be changed. 
The size should refer to the bytes preserved in these two audio PES packets after editing 
for splicing is made. The data alignment indicator may also change in the first audio PES 
packet of the second stream due to deletion of some obsolete audio access units. The 
original PTS and DTS values from the second stream are uniformly incremented in the 
spliced TS. 

In general, splicing of MPEG-2 Transport Streams involves selecting an end point 
in a first MPEG-2 TS stream, selecting a beginning point in a second MPEG-2 TS 
stream, combining the content of the first TS stream prior in presentation order to the end 
point with the content of the second TS stream subsequent in presentation order to the 
beginning point. Unfortunately, the TS streams are formatted so that the presentation 
order is often different from the order in which the content appears in the TS streams. In 
particular, transport packets including audio information are delayed with respect to 
corresponding transport packets of video information. Moreover, as noted above, the B 
frames appear in the TS streams in reverse of their presentation order with respect to the 
reference frames that immediately follow the B frames. As shown in FIG. 7, for 
example, the first Transport Stream 101 and the second Transport Stream 102 are 
subdivided by a dashed cut line 103 which indicates which of the audio packets (Al) and 
video packets (VI) in the first stream appear in presentation order before the end point, 
and which of the audio packets (A2) and video packets (V2) in the second stream 102 
appear in presentation order after the beginning point. Due to this problem, the transport 
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streams are parsed prior to splicing to determine the relative presentation time of the 
video and audio information around the desired beginning and end points. In addition, 
splicing is more difficult than just removing certain Transport Stream packets from the 
first and second Transport Streams and concatenating the two streams. In general, the 
audio data to keep and the audio data to discard will not be segregated into contiguous 
blocks in the Transport Streams. Typically the splicing operation will involve re- 
formatting of the audio data in the spliced Transport Stream, as discussed below with 
reference to FIG. 43. 

As shown in FIG. 8, the portion of the first Transport Stream prior to the end 
point has been parsed into a video PES stream 1 1 1 and an audio PES stream 1 12, and the 
portion of the second Transport Stream after the beginning point has been parsed into a 
video PES stream 1 13 and an aligned audio PES stream 1 14. The two video PES streams 
111, 113 have been jointed together at a dashed cut line 1 15, and the two audio PES 
streams have been also joined at the dashed cut line 115. The natural cut point for the 
audio stream, however, is not between video PES boundaries, and instead it is between 
audio access units (AAU) which are decoded to produce corresponding audio 
presentation units (APU). Therefore, there may be a slight gap or overlap at the cut line 
1 1 5 between the AAUs from the first Transport Stream and the AAUs from the second 
Transport Stream. The gap or the overlap is removed during a reformatting operation in 
which the spliced Transport Stream is produced from the parsed video PES stream and 
the parsed audio PES stream. Typically the reformatting operation will slightly shift the 
alignment of the audio presentation units from the second Transport Stream with respect 
to their corresponding video presentation units. 
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1 As shown in FIG. 9, the AAUs are not necessarily aligned on the audio PES 

2 packet boundaries in the elementary stream. There may be fractions of an AAU at the 

3 beginning 1 16 and/or end 1 17 of the PES packet pay load. The parsing and the 

4 reformatting operations take into account this non-alignment of the AAUs with the PES 

5 packet boundaries. Each AAU 5 for example, has 576 bytes, and decodes to a 24 

6 millisecond APU, for a sampling frequency of 48 kHz and audio bit rate of 192 kbits/sec. 

7 Of course, the splicing techniques disclosed here can be used with a variety of sampling 

8 rates and audio encoding techniques. 

9 One problem with the splicing of transport streams is the elimination of any audio 
~£L 10 discontinuity at the splice point without causing an excessive or cumulative skew in the 
j» 1 1 audio buffer level or in the alignment of the audio with the corresponding video. In 

M 12 general, there will be no alignment of the VPUs and the APUs because the audio and 

P 13 video frame durations are substantially incommensurate. For example, an MPEG-2 TS 

H! 14 encoding an NTSC television program with an audio sampling frequency of 48 kHz and 

rf 15 audio bit rate of 192 kbits/sec will have a video frame duration (VPU) of 1/29.97 sec. and 

S 16 an audio frame duration (APU) of 24 msec. In this example, the start of a VPU will be 

17 aligned (in presentation time) with the start of an APU possibly at the beginning of a 

18 stream and then only at multiples of 5 minute increments in time. This implies that later 

19 they will not be aligned again for all practical purposes. 

20 The splicing point between two MPEG-2 Transport Streams is naturally defined 

21 with respect to VPUs. The splicing point, for example, occurs at the end of the VPU for 

22 an Out Point (I or P frame) in the first TS, and at the beginning of the VPU for an In 
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1 


Foint (1 frame or a closed OOP) in the second lS. For splicing, the time base ot the 




2 


second TS is shifted to achieve video presentation continuity. 




3 


Because the AAUs are usually not aligned with the VPUs, there is an issue with 




4 


respect to the selection of AAUs to be included in the spliced TS. In general, audio 




5 


truncation (i.e., positioning of the cut with respect to the stream of AAUs in the first and 




6 


second TS) should always be done at the AAU boundaries. Fractional AAUs are useless 




7 


because the audio encoding algorithm is such that only whole AAUs can be decoded. 




8 


Audio truncation for the ending stream should be done with respect to the end of its last 




9 


VPU s presentation interval. Audio truncation for the beginning stream should be done 




10 


relative to the beginning of its first VPU s presentation interval. These general rules, 


Eg* 

?£§ 


11 


however, are insufficient to precisely specify which AAUs should be selected near the cut 


0 31 


12 


for inclusion in the spliced TS. 


HI 


13 


A more precise set of rules for selection of AAUs near the cut takes into 


O 

:; ,s 

h4 


14 


consideration the concept of the "best aligned APU" and also takes into consideration the 


I'.'iz 


15 


audio buffer level that would be expected in the beginning (i.e., second) stream absent 


o 


16 


splicing. The "best aligned final APU" of the ending (i.e., first) stream is defined as the 




17 


APU whose presentation interval ends within one APU interval centered about the time 




18 


of the cut. The "best aligned initial APU" of the beginning (i.e., second) stream is 




19 


defined as the APU whose presentation interval starts within one APU interval centered 




20 


about the time of the cut. As shown in the logic table of FIG. 10, there are eight possible 




21 


cases that can be identified in terms of the "best aligned final APU," the "best aligned 




22 


initial APU," and the presence of an audio gap or an audio overlap with respect to these 
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1 best aligned APUs after the alignment of the VPUs of first and second streams at the cut 

2 point. 

3 In FIG. 10, the APU duration is assumed to be 24 msec only for illustrative 

4 purposes without loss of generality. The eight cases are shown in FIGS. 1 1 A, 12A, 13 A, 

5 14 A, 15 A, 16 A, 17 A, and 18 A, and corresponding splicing solutions are shown in FIGS. 

6 11B, 11C, 12B, 13B, 14B, 15B, 16B, 17B, 17C,and 18B. FIGS. 1 IB and HCshow 

7 alternative solutions, and FIGS. 17B and 17C show alternative solutions. In FIGS. 1 1 A 

8 to 18B, VPUk designates the VPU of the Out-Point, APUj designates the best aligned 

9 final APU, VPUn designates the VPU of the In-Point, and APUm designates the best 

:f 10 aligned initial APU. Presentation time increases from left to right in the figures, and the 

m 

^ 1 1 bold dashed line is the cut line at which the beginning presentation time of VPUn 

y 12 becomes aligned with end presentation time of VPUk. 

@f? 13 The decoding logic of FIG. 10 can be implemented in software instructions for 

Q 14 computing delta values, where delta 1 is computed as the end of the presentation time of 

I™ 15 the last VPU of the first stream minus the presentation time of the end of the best aligned 

£=£: 

2f 16 final APU of the first stream. The best aligned final APU can be found by computing 

17 such a delta for each APU in the first stream around the time of the cut, and selecting the 

18 APU having such a delta that is within plus or minus one-half of the APU interval. Delta 

19 2 is computed as the beginning of the presentation time interval of the first VPU of the 

20 second stream minus the presentation time of the beginning of the best aligned initial 

21 APU of the second stream. The best aligned initial APU can be found by computing such 

22 a delta for each APU in the second stream around the time of the cut, and selecting the 

23 APU having such a delta that is within plus or minus one-half of the APU interval. 
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aligned initial APU, and modifying the PES packet size (in the corresponding PES packet 




22 


header field) accordingly. In addition and as mentioned above, the audio PES packet 




23 


carrying the best aligned initial APU and all subsequent audio PES packets are modified 
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by re-stamping their PTS values to follow in sequence from the PTS value of the audio 
PES packet carrying the best aligned final APU. The cases in FIGS. 1 1 A and 1 7A 
involve similar truncation and modification operations, but in these cases either an 
additional APU is included in between the best aligned APUs (case of FIG. 1 1 A) or one 
of the best aligned APUs is omitted (case of FIG. 17A). For the eight cases of audio 
splicing identified in FIG. 10, it is possible to construct a spliced audio elementary stream 
with no holes and no audio PTS discontinuity. As a consequence, an audio/video skew in 
presentation time of magnitude at most half of an APU duration will be introduced 
following the cut point in the spliced stream. This audio splicing technique can be 
repeated any number of times with neither a failure to meet its structural assumptions nor 
a degradation in this audio/video skew performance. The A/V skews introduced by the 
multiple splices do not accumulate. Irrespective of the number of consecutive splices, the 
worst audio/video skew at any point in time will be half of the APU duration. At each 
splice point, at the termination of the APUs and VPUs of the first stream, the total audio 
and video presentation durations up to that point will be almost matching each other, i.e., 
|video_duration - audio_duration| <= (1/2) APU_duration. Therefore always the proper 
amount of audio data will be provided by the audio splicing procedure described above. 
The resulting audio stream is error-free and MPEG-2 compliant. 

The audio and video elementary streams must be recombined around and 
following the splice point. This is conveniently done by reformatting of spliced 
Transport Stream around and following the splice point. The truncation of the final PES 
packet of the first audio stream will typically necessitate the insertion of some adaptation 
field padding into its last transport packet. The deletion of some AAU data from the 
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1 


beginning of the second audio stream's initial PES packet will typically necessitate the 




2 


editing of at most two audio transport packets. 




3 


In any MPEG-2 Transport Stream, the audio bit rate, over the span of a few VAU 




4 


durations, is substantially constant. The VAUs, however, are of varying sizes. Therefore 




c 

5 


xne reiauve positions 01 vaus ana aaus associated witn vrus ana Arus almost 




6 


aligned in time cannot be maintained constant. Almost always it is the case that the 




7 


AAUs are significantly delayed with respect to the corresponding VAUs for which the 




8 


decoded representations are almost synchronous. Therefore, splicing to achieve the 




9 


solutions for the cases of FIGS. 1 1 A to 18A also involves transport packet buffering and 





10 


re-multiplexing. The delayed audio packets near the Out Point in the first TS stream are 


i 

ass 


1 1 


temporarily stored in a buffer when the first TS stream is truncated based on the VAU of 


iy 


12 


the Out Point. Also, the spliced TS is reformatted by deletion of some obsolete audio 


£=M=. 


13 


packets at the beginning of the second stream around the In Point, and repositioning of 


— *~ 


14 


some audio packets of the first stream just following the Out Point into the spliced TS. 


r s 


15 


With reference to FIG. 19, there is shown a top-level flow chart of the preferred 




16 


procedure for splicing MPEG Transport Streams. At least the portions ot a first and 




17 


second MPEG TS stream around the Out Point and In Point, respectively, are assumed to 




18 


be stored in a buffer. The stored MPEG TS data for the first stream will be referred to as 




1 A 

19 


a first clip, and the stored MPEG TS data for the second stream will be referred to as a 




20 


second clip. 




21 


In a first step 121, the splicing procedure receives an indication of a desired end 




22 


frame of the first clip and a desired start frame of the second clip. Next, in step 122, the 




23 


splicing procedure finds the closest I frame preceding the desired start frame to be the In 
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Point for splicing. In step 123, a video splicing subroutine is invoked, as further 
described below with reference to FIGS. 23 to 24. In step 124, an audio splicing 
subroutine is invoked, as further described below with reference to FIGS. 25 to 26. 
Finally, in step 125, the concatenation of the first clip up to about the Out Point and the 
second clip subsequent to about the In Point is re-formatted, including re-stamping of the 
PTS and PCR values for the audio and video. 

Considering now video splicing, the splicing procedure should ensure the absence 
of objectionable video artifacts, preserve the duration of the spliced stream, and if 
possible, keep all of the desired frames in the spliced stream. The duration of the spliced 
stream should be preserved in order to prevent any time drift in the scheduled play-list. 
In some cases, it is not possible to keep all of the original video frames due to buffer 
problems. In such a case, one or more frames of the clip are replaced by frozen frames, 
and this frame replacement is made as invisible as possible. 

Management of the video buffer is an important consideration in ensuring the 
absence of objectionable video artifacts. In a constant bit rate (CBR) and uniform picture 
quality sequence, subsequent pictures typically have coded representations of drastically 
different sizes. The encoder must manage the decoder's buffer within several constraints. 
The buffer should be assumed to have a certain size defined in the MPEG-2 standard. 
The decoder buffer should neither overflow nor underflow. Furthermore, the decoder 
cannot decode a picture before it receives it in full (i.e. completely). Moreover, the 
decoder should not be made to "wait" for the next picture to decode; this means that 
every 40 ms in PAL and 1/29.97 second in NTSC, the decoder must have access to a full 
picture ready to be decoded. 
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The MPEG encoder manages the video decoder buffer through decode time 




Z 


stamps i o ), presentation time stamps i oj, ana program ciock reierenLe ^r^ivj 




3 


values, witn reference to rio. zUA, tor example, there is snown tne viaeo Duner level 




4 


during the playing of a first clip. The x-axis represents the time axis. The video buffer 




5 


level initially increases in a linear fashion over a segment 1 3 1 as the buffer is loaded at a 




6 


constant bit rate. Then over a time span 132, video data is displayed at frame intervals, 




7 


ana tne Duner is replenished at least to some extent oetween tne irame intervals. At a 




8 


time T e , the last video frame's data is finished being loaded into the video buffer. Then 




9 


the video buffer is periodically depleted to some extent at each subsequent video frame 





10 


interval, and becomes emptied at a time DTSli- 


tm 

f 


11 


FIG. 20B shows the video buffer level for a second clip. The video buffer begins 


z: r 

i 


12 


to receive video data for the second clip at a time PCRe2- (PCR«2 is extrapolated from the 


13 

tji E 


13 


value of the most recent received genuine PGR record, to the first byte of the picture 


y 


14 


header sync word of the first video frame in the clip to start. The extrapolation adjusts 


=-; it 3 


15 


this most recently received genuine PCR record value by the quotient of the displacement 




16 


in data bits of the clip from the position where it appears in the second clip to the position 




17 


at which video data of the first frame of the second clip begins, divided by the data 




1 o 

18 


transmission bit rate for transmission of the clip to the decoder.) The video buffer level 




1 A 

19 


initially increases in a linear iasnion over a segment 1 j** as tne Duner is loaaea at a 




20 


constant bit rate. However, the slope of the segment 134 in FIG. 20B may be 




21 


substantially different from the slope of the segment 13 1 in FIG. 20A. In each case, the 




22 


slope of the segment is proportional to the bit rate at which the data is loaded into the 




23 


video buffer. As shown, the video data of the second clip is received at the video buffer 
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1 at a higher bit rate than the video data of the first clip. At a time DTSf2, the first frame of 

2 the second clip is decoded as more video data from the second clip continues to flow into 

3 the video buffer. 

4 When splicing the end of the first clip of FIG. 20A to the beginning of the second 

5 clip of FIG. 20B, there will be a problem of video buffer management if duration of time 

6 DTSu-Te is different from the duration of time DTSF2-PCRe2 minus one video frame 

7 (presentation) interval. Because the time PCR^ must just follow T e , there will be a gap 

8 in the decoding and presentation of video frames if DTSf2-PCRc2 is substantially greater 

9 than DTSu-Te plus one video frame interval. In this case, the buffer will not be 

10 sufficiently full to begin decoding of the second clip one video frame interval after the 

1 1 last frame of the first clip has been decoded. Consequently, either the second clip will be 

12 prematurely started to be decoded or the decoder will be forced to repeat a frame one or 

13 more times after the end of the display of the last frame from the first clip to provide the 

14 required delay for the second clip's buffer build-up. In the case of a premature start for 

15 decoding the second clip, a video buffer underflow risk is generated. On the other hand, 

16 in case of repeated frames, the desired frame accuracy for scheduled play-lists is lost 

17 besides the fact that a precise timing adjustment can neither be achieved through this 

18 procedure. 

19 If DTSf2-PCR«2 is substantially less than DTSu-T e plus one video frame interval, 

20 then the decoder will not be able to decode the first frame of the second clip at the 

21 specified time DTSf2 because the last frame of the first clip will not yet have been 

22 removed from the video buffer. In this case a video buffer overflow risk is generated. 

23 Video buffer overflow may present a problem not only at the beginning of the second 
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clip, but also at a subsequent location of the second clip. If the second clip is encoded by 
an MPEG-2 compliant encoder, then video buffer underflow or buffer overflow will not 
occur at any time during the decoding of the clip. However, this guarantee is no longer 
valid if the DTSF2-PCRe2 relationship at the beginning of the second clip is altered. 
Consequently, to avoid buffer problems, the buffer occupancy at the end of the first clip 
must be modified in some fashion. This problem is inevitable when splicing between 
clips having significantly different ending and starting buffer levels. This is why SMPTE 
has defined some splice types corresponding to well-defined buffer levels. 

In order to seamlessly splice the first clip of FIG. 20 A to the second clip of FIG. 
20B, the content of the first clip (towards its end) is modified so that PCR e 2 can just 
follow T e (by one byte transmission time) and DTSf2 can just follow DTSli (by one 
video frame presentation interval). FIG. 21 shows the video buffer level for the spicing 
of the first clip to the second clip in this fashion. The content around the end of the first 
clip has been modified to provide a buffer emptying characteristic shown in dashed lines, 
such as the line segments 136, so that the buffer is emptied sooner of video data from the 
first clip. In particular, this is done by replacing a frame loaded into the video buffer over 
an interval 137 with a "freeze frame" having a selected amount of video data. The 
position of DTSli has not changed, the position of DTSf2 is one video frame interval 
after DTSli, and the relationship DTSF2-PCRe2 is unchanged, but the position of T e has 
been moved to T e ' in order to achieve the desired conditions for seamless video splicing. 

FIG. 22 shows a flow chart of a seamless video splicing procedure that obtains the 
desired conditions just described above. In a first step 141, the first DTS of the second 
clip is anchored at one frame interval later than the last DTS of the first clip in order to 
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1 prevent a video decoding discontinuity. Then, in step 142, the procedure branches 

2 depending on whether the PCR extrapolated to the beginning frame of the second clip 

3 falls just after the ending time of the first clip. If so, then the splice will be seamless with 

4 respect to its video content. Otherwise, the procedure branches to step 143. In step 143, 

5 the content of the first clip is adjusted so that the PCR extrapolated to the beginning 

6 frame of the second clip falls just after the ending time of the first clip. Therefore the 

7 desired conditions for seamless video splicing are achieved. 

8 With reference to FIG. 23, there is shown a more detailed flow chart of a seamless 

9 video splicing procedure. In a first step 151, the procedure inspects the content of the 
O 10 first clip to determine the last DTS/PTS of the first clip. This last DTS/PTS of the first 

*t! 1 1 clip is designated DTSli. Next, in step 1 52, the procedure inspects the content of the first 

SS3S3 

^ 12 clip to determine the time of arrival (T e ) of the last byte of the first clip. In step 153, the 

rfi 13 procedure adds one frame interval to DTS L i to find the desired first DTS location for the 

jy 14 second clip. The sum, designated DTS F i, is equal to DTS L i +1/FR, where FR is the video 

M 15 frame rate. In step 154, while keeping the DTS-PCRe relationship unaltered, the 

y 16 procedure finds the time instant, designated Ts, at which the first byte of the second clip 

17 should arrive. This is done by calculating Tstart = DTSf2-PCR € 2> and Ts=DTSfi-Tstart. 

18 Continuing in FIG. 24, in step 155, execution branches depending on whether Ts 

19 is equal to T e plus 8 divided by the bit rate. If not, then the clips to be spliced need 

20 modification before concatenation, and execution branches to step 156. In step 156, 

21 execution branches depending on whether Ts is less than T e plus 8 divided by the bit rate. 

22 If not, then there is an undesired gap in between the clips to be spliced, and execution 

23 branches to step 157. In step 157, null packets are inserted into the clips to be spliced to 
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compensate for the gap. The gap to be compensated has a number of bytes, designated 
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C\ oniial tr\ /T„ T VRTT P ATPVft mimic nnp Tf in ct^r*v 1 ^£ T„ ic 1f>CC than T nine & 

Or, equal io ^ i s~ l ej^r>i i .kaicj/o minus one. iiinsiepijo, i s it> ies>b uiaii i e piubo 
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divided oy tne on rate, men execution continues irom s>iep i jo lu Mep uo iu upcii up a 




4 


cenain amount 01 space in tne iirst cup to acnieve i s — 1 e^of \di i s\j\ idj. i ne numuer 01 




5 


bytes to drop is one plus ( 1 e - 1 sa£>i ^ i tij/o. 11 possible, tne Dytes are dropped oy 




6 


removing nun pacKets. utnerwise, one or n neeoed more preaictea video rrames are 




7 


replaced with smaller, variable-size freeze frames. 




8 


ir in step i i s is round to oe equal to i e plus o divided oy tne oit rate, tnen 




9 


execution continues to step 159. Execution also continues to step 159 from steps 157 and 


.=32=. 


10 


Ijo. in step i jy, tne transport streams irom tne two cups are concatenated, rinaiiy, in 


m 


11 


step 160, a subroutine, as described below with reference to FIG. 38, is called to compute 




1 o 
12 


a video time stamp offset, designated as V 0 ff se t. 


31 


13 


With reference to FIG. 25, there is shown the beginning of a flow chart of an 


W 


1 A 

14 


audio splicing procedure, in a iirst step i / 1 , tne procedure rinds tne audio access unit 




15 


(AAU) of the first clip best aligned with the end frame of the first clip (in terms of the 


a 


16 


ending instants of their presentations) after splicing of the video. Then, in step 172, the 




17 


procedure finds the audio access unit (AAU) of the second clip best aligned with the In 




1 o 


r oint oi tne second cup ^in terms 01 me starting instant 01 us presentation^, in step i / j, 




19 


for the second clip the mean audio buffer level, assuming no modification made for 




20 


splicing, is compared to a mgn tnresnoid, designated r>. ^r>, ior example, nas a vaiue oi 




21 


66% of the audio buffer capacity.) If this mean audio buffer level exceeds the high 




22 


threshold B, then the procedure branches to step 174. In step 174, if the above-defined 




23 


best aligned AAUs do not achieve a backward skew, then the best aligned AAUs are 
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1 modified by dropping only one of them in either of the clips to reduce the mean audio 

2 buffer level for the second clip. In step 1 73, if the mean audio buffer level does not 

3 exceed the high threshold B, then execution continues to step 175. In step 175, the mean 

4 audio buffer level for the second clip, assuming no modification made for splicing, is 

5 compared to a low threshold, designated A. (A, for example, has a value of 33% of the 

6 audio buffer capacity.) If this mean audio buffer level is less than the low threshold A, 

7 then the procedure branches to step 176. In step 176, if the above-defined best aligned 

8 AAUs do not achieve a forward skew, then the best aligned AAUs are modified by 

9 appending only one extra AAU either after the best aligned AAU in the first clip or 

10 before the best aligned AAU in the second clip to increase the mean audio buffer level for 

1 1 the second clip. 

12 In general, a forward skew of the AAUs from the second stream by incrementing 

13 their presentation time instants tends to increase the mean audio buffer level. Therefore, 

14 a forward skew is good if the mean audio buffer level is low for the second stream. A 

15 backward skew of the AAUs from the second stream by decrementing their presentation 

16 time instants tends to decrease the audio buffer level. Therefore, a backward skew is 

17 good if the mean audio buffer level is high for the second stream. 

1 8 In step 1 75, if the mean audio buffer level is not less than the low threshold A, 

19 then the procedure continues to step 177 in FIG. 26. The procedure continues to step 177 

20 also after steps 174 and 176. In step 177, the procedure removes all AAUs in the first 

21 clip after the best aligned AAU in the first clip, and adjusts the last audio PES packet 

22 header in the first clip to reflect the change in its size in bytes after the removal. In FIG. 

23 26, step 178, the procedure finds the audio PES packet in the second clip which includes 
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1 the best aligned AAU in the second clip, and removes all AAUs preceding the best 

2 aligned one in this PES packet. Then in step 179, the procedure produces a PES packet 

3 header to encapsulate the best aligned AAU and the AAUs after it, and writes the PES 

4 packet size into the header. Finally, in step 1 80, the procedure calculates the required 

5 audio PTS offset (A 0 ff se t) to be used for re-stamping the audio of the second clip. 

6 The preferred implementation of the audio splicing routine in FIGS. 26 and 27 

7 uses the logic shown in FIG. 27. Depending on whether the mean audio buffer level for 

8 the second clip, assuming no modifications are made for splicing, is greater than the high 

9 threshold B or less than the low threshold A, the eight cases of FIG. 10 are expanded to 
10 sixteen cases. The preferred solutions for these eight additional cases are shown in FIGS. 



11 28 to 35. When the mean audio buffer level for the second clip, assuming no 

12 modifications are made for splicing, is neither greater than the high threshold B nor less 

13 than the low threshold A, then the solutions shown in FIGS. 1 1 to 18 are immediately 

14 applicable. 



16 the product (PTSi-PCRei)(BIT RATE) as an indication of the audio buffer level. PTSi 

17 denotes the ith audio PTS time stamp, and PCR^ denotes the PCR value extrapolated to 

1 8 the bit position of PTSi. Because the product (PTSi-PCRei)(BIT RATE) will fluctuate 

19 more rapidly than the mean audio buffer level, the computed values may be processed by 

20 a simple digital filter routine to obtain an estimated value of the mean audio buffer level 

21 at any point of a clip. Shown in FIG. 36, for example, is a digital filter schematic that 

22 includes a single first-order recursive stage 191 for computing an estimate of the mean 

23 audio buffer level ABV. The computation includes a scaling of (PTSj-PCR<n)(BIT 



m 



15 



A preferred method of estimating the mean audio buffer level of a clip is to use 



-37- 



H:363314(7SC201!.DOC) 



1 RATE) by a factor of l/n av , where n av is the effective number of samples over which the 

2 mean is estimated. The scaled value is added to the previous estimate of the mean value 

3 of ABV scaled by a "forgetting factor" of 1 - l/n av . The previous value is stored in a 

4 register 192. In a similar fashion, an estimate of the variance of the audio buffer level at 

5 any point of a clip is computed by similar circuitry or computations depicted in FIG. 36. 

6 For example, the estimate of the variance can be computed by a subtractor 193 that 

7 calculates the deviation of each sample of (PTS i -PCR € i)(BIT RATE) from the estimated 

8 mean audio buffer level, a squaring unit 194, and another first-order recursive filter stage 

9 generally designated 195. 

10 Instead of determining whether the mean audio buffer level is relatively high or 

1 1 low for a clip, a determination can be made as to whether the audio buffer full level (i.e., 

12 audio buffer size) is within a certain number of estimated standard deviations from the 

13 estimated mean audio buffer level, or whether the audio buffer empty level (e.g., zero 

14 bytes) is within a certain number of estimated standard deviations from the estimated 

1 5 mean audio level. In this case, the certain number can be selected based on the usual 

16 statistics of the type of audio encoding that is employed, in order to ensure the absence of 

17 audio buffer underflow or overflow within a desired level of confidence. In order to 

18 make the comparisons very simple at the time of splicing, the maximum and minimum 

19 expected deviations from the estimated average can be computed in advance for each 

20 clip. For example, FIG. 37 shows in schematic form the computations necessary to 

21 compute the maximum of the estimated mean buffer level AVB plus twice the estimated 

22 standard deviation, and to compute the minimum of the estimated mean buffer level AVB 

23 minus twice the standard deviation. The box 198, for example, outputs a binary value 
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i 
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indicating wnetner or not tne input A is greater man tne input t>. ine symooi lyy denotes 




2 


a multiplexer or selection step. 1 ne symooi zuu aenotes a square root operator diock. 




3 


The other symbols in FIG. 37 have meanings similar to the like symbols in FIG. 36. 




4 


To simplify audio buffer management during splicing transients, it is 




5 


recommended to have the same audio buffer levels at the beginning and at the end of the 




6 


clips. The case of going from a low to a high audio buffer level is the most problematic, 




7 
8 


and is addressed by a sufficiently precise mean buffer level estimate for beyond the 
selected In Point. 




9 


If there are multiple audio streams for one program, then all of these individual 




10 


audio streams are processed independently in the fashion described above for a single 


1 


11 


stream. For example, there could be two stereo audio streams for one program, or four 


pj 


12 


audio streams for quadraphonic sound. The association of the ending (i.e., first) clip and 




13 


starting (i.e., second) clip audio streams to splice together depends on the PID of the 


J 

y 


14 


streams after PID re-mapping, if there is PID re-mapping, or on the PID of each stream in 


i 


15 


the spliced clips, if there is no PID re-mapping. For an audio stream of the ending clip 


•sis 

LJ 


16 


that has no audio stream in the starting clip that can be associated with it, the preserved 




17 


audio packets are played until the end. This will achieve the best possible alignment 




1 o 

lo 


oetween audio and video ior tne ending cup. 




19 


The method used above for seamless audio splicing can also be used for splicing 




20 


otner elementary streams containing encapsulated data, r or example, a i o may nave 




21 


additional elementary streams of other data encapsulated in access units such as access 




22 


units for teletext, closed captioning, VBI, etc. To apply the seamless splicing method to a 




23 


TS having multiple elementary streams of non-video and non-audio access units, the 
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1 AU's in each elementary stream are found that are best aligned with the first and last 

2 video frames, and an AU sequence over the splice is selected, independent of the content 

3 of the other non-video elementary streams. In this case, the method will minimize skew 

4 with respect to associated video frames and also prevent accumulation of skew from 

5 multiple splices in the TS. 

6 With reference to FIG. 38, there is shown a flow chart of a procedure for 

7 calculating the video time stamp offset Voffset- In a first step 21 1 , the procedure finds 

8 the DTS of the last video frame (in decode order) of the first clip. This DTS of the last 

9 video frame of the first clip is denoted DTSvli- Then in step 212, the procedure finds the 

10 original DTS of the first frame to be decoded in the second clip. This DTS of the first 

11 frame to be decoded in the second clip is denoted DTSvf2- Finally, in step 213, the video 

12 time stamp offset Voffset is computed as DTS V li-DTS V f2 plus one video frame duration. 

13 With reference to FIG. 39, there is shown a flow chart of a procedure for 

14 calculating the audio time stamp offset Aoffset- In a first step 221, the procedure finds 

15 the PTS of the last AAU of the first clip. This PTS of the last AAU of the first clip is 

16 denoted PTSali- Then in step 222, the procedure finds the original PTS of the first AAU 

17 to be decoded in the second clip. This PTS of the first AAU to be decoded in the second 

18 clip is denoted PTSah- Finally, in step 223, the audio time stamp offset Aoffset is 

19 computed as PTSali -PTS ao plus one AAU duration. 

20 With reference to FIG. 40, there is shown a flow chart of a procedure for 

21 calculating the PCR offset PCRoffset- In a first step 23 1 , the procedure finds the 

22 extrapolated PCRe for the last byte of the first clip. This extrapolated PCRe is denoted 

23 PCR^u. Then in step 232, the procedure finds the original extrapolated PCR^ for the first 
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1 byte of the second clip. This extrapolated PCR^ is denoted PCReF2- Finally, in step 233, 

2 the PCR offset PCRoffset is computed as PCR e Li-PCR €F 2 plus eight divided by the bit 

3 rate. 

4 With reference to FIG. 41, there is shown a flow chart of a procedure for re- 

5 stamping the time stamps in the portion of the second clip appearing in the spliced 

6 transport stream. In step 241 , the video time stamp offset Voffset is added to the DTS 

7 and PTS fields of all video PES packets in the second clip. Next, in step 242, the audio 

8 time stamp offset Aoffset is added to the PTS fields of all audio PES packets in the 

9 second clip. In step 243, the PCR time stamp offset PCRoffset is computed by invoking 

10 the subroutine of FIG. 40. In step 244 the PCRoffset is added to all PCR records in the 

1 1 second clip. In step 245 the PID fields of the TS packets of the various streams in the 

12 second clip are re-stamped based on their associations with the various streams of the 

13 first clip. Finally, in step 246, the continuity counter fields of the TS packets of the 

14 various streams are re-stamped in the second clip so as to achieve stream continuity from 

15 the first clip to the second clip. 

16 In order to solve certain buffer problems and also to avoid artifacts in case of clips 

17 starting with an open GOP, it sometimes becomes necessary to remove some frames. If 

18 these frames are removed from the stream without any replacement, a "hole" in the frame 

19 presentation time sequence will be generated. In this case, the result depends on the 

20 decoder implementation (i.e. on how a particular decoder handles this situation). For 

21 example, some decoders try to correct the problem by themselves. More precisely, they 

22 do not take the recorded DTS values into account and continue decoding the frames they 

23 have received until they possibly enter an underflow state. The observed result is a freeze 
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of the scene which occurs some frames after the splicing point (sometimes 10 frames). In 
other decoders the consequences could be more catastrophic. 

To avoid any unpleasant effect in a controlled fashion, the frames which cannot 
be decoded are replaced by encoded frozen frames. These frames are encoded such that 
they effectively repeat a previous frame in the decoding order. They can be either B- 
frames or P-frames. The frozen frame implementation relies on null motion vectors and 
no coded transform coefficients. Consequently, these frames are completely MPEG-2 
compliant and the decoder doesn't encounter any discontinuity in the stream. 

With these frozen frames, decoder freeze can be controlled to make the visual 
perception cleaner. There are three different types of encoded frozen frames that can be 
generated for this purpose. These three types are a P-frame repeating the previous I or P 
frame (in display order), a B-frame repeating the previous I or P frame (in display order), 
and a B-frame repeating the following I or P frame (in display order). Moreover, any 
frozen frame should not be separated from the frame it is repeating by some live (i.e. non- 
frozen) frames in display order. To avoid any undesirable flickering effect due to the 
presence of two fields within an interlaced frame, the frozen frames are generated using 
the dual motion prediction type which allows the encoding of one field by extrapolation 
(prediction) from the dual field. 

With reference to FIG. 42, there is shown a diagram of the pixels in a video frame 
250. According to the MPEG video encoding standard, the video frame can be 
subdivided into a rectangular array of macroblocks, where each macroblock 251 includes 
a square array of pixels. Pixels on the lower right and lower borders of a frame that do 
not fit into full size macroblocks are handled as follows. The frame horizontal and 
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vertical sizes are completed to tne nearest integer multiples 01 macrooiocK nonzontai ana 
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vertical sizes oy ngnt-most column ana lower-most row repiiuons respectively, i ne 
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Mriivj stanaara also permits slices, or linear arrays 01 contiguous macro diocks, to oe 




4 


defined, with the maximum sized slices including an initial macroblock in a left-most 




5 


column and a final macroblock in a right-most column. For example, a maximum size 




6 


slice Zjj is snown inciuaing an 01 tne macro diocks in tne tmra row 01 tne macro diock 




7 
8 


matrix, a large numoer or consecutive macrooiocKs in a snce can oe very enicieniiy 
encoded by a command to skip that number of macroblocks immediately after the initial 




9 


macroblock in the slice. In case of a skip, the encoding information (i.e., the motion 




1 A 

10 


vectors ana quantizea dli coenicients ior tne preaiction error j is common to an sKippea 


ii 


1 1 


macroblocks and therefore is not repeated for each skipped macroblock. 




12 


It is possible to encode a "freeze frame" in various ways, such that the encoding 




13 


of the "freeze frame" will result in a selected variable size. The smallest freeze frame 


■....„. 


14 


will define the maximum number of skipped macroblocks and maximum size slices, and 


in H 


15 


a null set of DCT coefficients for the prediction residual and zero valued displacement 




16 


vectors. The largest freeze frame will define, for each of the non- skipped macroblocks, a 




17 


set of zero valued DCT coefficients for the prediction residual and zero valued 
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aispiacemeni vectors, rreeze ironies 01 lnienncuidtc mzco can uc uciiiicu vy usmig 
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ainereni numDcro 01 oKippcu niaLruuiucivo, alia iiicii vaiiuuo aizca ui Diicca ui 
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macro diocks. Also, a siignt aajustment can oe maae oy paaaing. r aaaing is aone oy 




21 


placing some stuffing bytes in the adaptation field (see FIG. 5). 




22 


With reference to FIG. 43, there is illustrated a problem of non-obsolete audio TS 




23 


packets 260 that follow in the first clip after the end 261 of the video TS packet for the 
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cannot be repositioned into existing packet positions in the second clip after the 
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seamless video spncing. in panicuidr, me numuer 01 diis 01 me remdining non-ouboieie 
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audio packets must either fit in the gap that needs to be compensated in step 157 of FIG. 
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Z4, or win require aaQixionai space xo oe openea up in xne cup in sxep i jo ^ior example 


m 


1 *~> 

12 


oy reducing xne size oi a ireeze irame or increasing xne numoer oi vioeo irames in xne 
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first clip that must be replaced by freeze frames) to make room for them. 
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With reference to FIG. 44, there is shown a procedure of a re-formatting operation 
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clip after the end 261 of the video TS packet for the Out Point. In a first step 271, the 
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obsolete audio packet in the second TS. Next, in step 272, the procedure replaces any of 
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the "k" null packets or obsolete audio packets in the second TS stream with 
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corresponding ones of the "j" non-obsolete audio packets in the first TS stream, 
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point. This is achieved by re-stamping all of the program identification indices (PIDs) 
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within the second clip with the associated stream PIDs of the first clip. The program 
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identification indices must be the same for the different component streams which form a 
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continuation before and after the splicing points. In addition, the continuity counter 
sequence for each elementary stream must be evolving continuously across the splicing 
point. Therefore, typically all of the continuity counter values are re-stamped for each 
transport packet of the second stream. 

There can also be a need for some further reformatting to permit the In Point to be 
an I frame of an open GOP, and to select where freeze frames should be inserted in the 
last GOP before the Out Point. When the clip to decode and present for viewing starts 
with an open GOP, some B-frames will typically contain references to a frame that was in 
the previous GOP at the encoding time. These reference frames are not present in the 
new stream. So, it is not possible to play these B frames without artifacts. They must be 
removed. However, in order to keep an MPEG-2 compliant stream and also to preserve 
frame accuracy, these B frames are replaced by encoded frozen frames referring to a 
previous (in display order) I or P frame. As these B frames sent after the first I frame of 
the clip to start, are presented before it, the freeze will occur just at the splicing. The last 
anchor frame of the completed clip is repeated one or several times, but the new clip 
starts without any artifacts. 

At the end of a clip, before decoding the last GOP to play, the procedure 
determines which changes are to be performed in this GOP to avoid buffer problems. To 
do this, the procedure accesses the following data: 

- the last GOP size (in bytes) 

- the last GOP size (in frames) 
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1 - the DTS-PCR* at the beginning of this GOP (i.e. for its first frame) and 

2 the ending delay T en d = DTS L i-T e at the end of this GOP which can be 

3 computed. 

4 - the number of frames to play from this GOP which is not necessarily 

5 equal to the full GOP size. 
6 

7 To rebuild this GOP, the procedure has access to the GOP structure and the size of each 

8 frame. So, the last GOP is read in full into the memory. This is done only if the 

9 procedure needs to terminate with an incomplete GOP. If a play-at-time interrupt arrives 

10 during playing a clip, the procedure determines in advance the number of frames 

1 1 remaining before the transition to the next clip to prepare the GOP. 

12 The frames to be replaced by encoded frozen frames depend on the GOP 

13 structure. This point will be illustrated by examples. 
14 

15 Example 1 : Incomplete GOP with 3n frames. 



U 16 



17 Transport order: I BBPBBPBBPBB 

18 Display order: 20153486711910 
19 

20 Case 1 : The procedure has to play 3n frames. The procedure takes the first 3n 

21 frames without any problem since the set of the first 3n frames in the transport order is 

22 the same as the set of the first 3n frames in display order as shown above. 
23 
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1 Example 2: Incomplete GOP with 3n+l frames. (Case of 3n+l=7 is illustrated.) 

2 

3 

4 Transport order: I B B P B B Pff 

5 Display order: 2 0 1 5 3 4 6 
6 

7 Case 2: The procedure has to play 3n+l frames. Then the procedure replaces the 

8 last frame by a frozen frame as shown above. Pff implements a freeze of P5. 
9 

^ 10 Example 3: Incomplete GOP with 3n+2 frames. (Case of 3n+2=8 is illustrated.) 

S 12 

S 13 Transport order: I B B P B B Pff Bff 

O 14 Display order: 2 0 1 5 3 4 7 6 

W 15 

!!f 16 Case 3: The procedure has to play 3n+2 frames. Then the procedure inserts two 

17 frozen frames as shown above. Both Bff and Pff implement freeze of P5. 
18 
19 

20 Example 4: Structurally closed IBBP. . . GOP. 
21 

22 Transport order: IPBBPBBPBBPBB 

23 
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Display order: 03 1 2 64 59 78 12 10 11 
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Within this GOP structure playing 3n+l frames is trivial and can be achieved 
without any freeze frames. Playing 3n+2 frames can be achieved by freezing just one 
frame as illustrated below for the case of 3n+2=8: 



Transport order: I P B B P B B Pff 
Display order: 0 3 1 2 6 4 5 7 

where Pff implements a freeze of P6. Similarly, playing 3n frames can be 
achieved by freezing two frames as illustrated below for the case of 3n=9: 

Transport order: I P B B P B B Pff Bff 
Display order: 0312645 8 7 

where Pff and Bff both implement a freeze of P6. 

These changes are applied before taking into account the buffer level. They provide a 
modified GOP tailored for achieving the desired temporal frame accuracy. After these 
transformations related to the GOP structure are performed, the buffer level (DTS-PCR) 
at the end of this GOP is computed based on the resultant (i.e. modified) GOP structure. 
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If the new GOP's (i.e. the first GOP of the clip to start) buffer level is too high 
and if there is no padding bandwidth available in the end of the first clip, then additional 
frames are replaced by encoded frozen frames, starting from the last one in transport 
order and proceeding one frame at a time (towards the beginning of the first clip) until the 
GOP size becomes small enough. 

These GOP transformations can be done in advance, as soon as the number of 
frames to play in the current clip becomes known. This means that, if there is a play-at- 
time command to start the next clip, then the timer must expire late enough to allow the 
computation of frames remaining to play and also the preparation of the last GOP. 

With reference to FIG. 45, it is possible to pre-compute metadata that can speed 
up the process of seamless splicing. This is especially useful when the seamless splicing 
must be done on the fly, during real-time delivery of a TS stream. For example, a stream 
server of the video file server 20 of FIG. 1 performs metadata computation (281 in FIG. 
45) when the file server records the MPEG TS stream in a MPEG file 282. As the MPEG 
TS data 285 becomes recorded in the MPEG file 282, the metadata is recorded in a 
header of the MPEG file. The header, for example, is a first megabyte of random- 
accessible address space in the file. Preferably, the metadata includes some metadata 283 
associated with the clip as a whole, and metadata 284 associated with the individual 
GOPs. Preferably, the metadata 284 associated with the individual GOPs is stored in a 
GOP index table. 

The metadata 283 associated with the clip as a whole includes a program number, 
the video frame rate, status of the clip, the number of GOPs in the clip, stream identifiers 
for the various elementary streams in the TS, a byte index indicating a beginning position 
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much MPEG TS data becomes written to the MPEG TS file that there is insufficient 
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space in the 1 megabyte header to hold entries for all of the GOPS, then the GOP index 
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can be decimated by a factor of two by writing the content of the GOP entry for GOP no. 
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1 2 over the GOP entry for GOP no. 1 , writing the content of the GOP entry for GOP no. 4 

2 over the GOP entry for GOP no. 2, writing the content of the GOP entry for GOP no. 6 

3 over the entry for GOP no. 3, etc. 

4 With reference to FIG. 47, there is shown a flow chart for GOP index decimation. 

5 In a first step 33 1 , before computing attributes for any GOP, a GOP decimation factor is 

6 set to one in the metadata for the clip. (This decimation factor, for example, is used to 

7 find a GOP table index for a given GOP number by dividing the given GOP number by 

8 the decimation factor.) Computation of attribute values for the GOPS found in an 

9 ingested TS and the writing of those attribute values to respective entries in the GOP 

10 index continues in step 332 until the end of the GOP index is reached in step 333. Then 

1 1 the procedure continues to step 334 where the GOP index is decimated by a factor of two. 

12 Finally, the decimation factor is increased by a factor of two, and the procedure loops 

13 back to step 332. 

14 Some of the metadata is of high priority and some of the metadata is of lower 

15 priority. In the absence of sufficient computational resources, the high priority metadata 

16 can be pre-computed without pre-computing the lower priority metadata. For example, 

17 the frame rate for the clip is a high priority item but the number of frames in the clip is a 

18 low priority item. The frame number and the pointer to the corresponding MPEG TS 

19 data (i.e., a byte index) are high priority GOP attributes. The flag indicating whether or 

20 not the GOP is open or closed is a low priority item. In the situation where it is possible 

21 that a GOP entry will include the high priority items but not the low priority items, the 

22 low priority items are encoded with an indication of whether they are valid or not. This 
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1 can be done by initially setting the low priority items to predetermined invalid values 

2 indicating that valid attribute values are not yet computed. 

3 With reference to FIG. 48, there is shown a flow chart of metadata computations 

4 for a next GOP processed in a TS. In a first step 341, if resources available for 

5 computing high priority metadata are not presently available, then the computations for 

6 the GOP are terminated. Otherwise, the procedure continues to step 342, where the high 

7 priority metadata is computed for the GOP. Then, in step 343, if resources for computing 

8 low priority metadata are not available, then the computations for the GOP are 

9 terminated. Otherwise, the procedure continues to step 344, where the low priority 

10 metadata is computed for the GOP. 

1 1 The GOPs in a TS can be fixed size (same size throughout the TS) or variable size 

12 in terms of the number of video frames they contain. If the GOPs are of a fixed size, then 

13 each has an integral number of "n" frames. In this case, assuming that the first frame 

14 number in the TS is "m", then the number of the GOP containing a specified frame "p" 

15 can be calculated as the integer quotient of (p-m)/n plus one. If the GOPs are of variable 

16 size, then the metadata may include an average GOP size; i.e., an average number of 

17 frames per GOP. In this case, to find the GOP containing a specified frame, the GOP 

18 number is estimated using the same formula (using the average number of frames per 

19 GOP for n), and then the GOP index table is searched in the neighborhood of this GOP 

20 for the GOP containing the specified frame number. 

21 The metadata contains information on the clip which is used during the play 

22 operation to check the buffer levels and to adjust these levels at the splicing time. The 

23 fundamental information item of metadata is the difference DTS-PCRe for each video 
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1 access unit within the video stream which is representative of the buffer level in the sense 

2 described previously. It should be noted that DTS values are defined for I and P frames 

3 for which the decoding and presentation times differ since these frames are used as 

4 references by other P and B frames. However, for type B frames only PTS is defined 

5 which is identical to the DTS of the same frame. 

6 A subsection of the metadata includes the following two values: 
7 

8 First PTS: This is the PTS of the first frame in display order. 
9 

10 First PCR, (PCRe,o): This is a calculated (i.e, extrapolated) PCR value corresponding 

1 1 to the beginning (i.e. the first byte) of the file. This value is computed from the bit-rate, 

12 the value of the first genuine PCR record and its byte position within the file. 
13 

14 Based on these two values, for each I frame the procedure computes both the DTS 

15 of this frame and also the PCRe value corresponding to the beginning of this frame within 

16 the file. In order to perform these calculations, the procedure also accesses the frame 

17 number (a cumulative frame count from the beginning of the file) and the byte position of 

18 the beginning of this frame in the file, both of which are recorded in the GOP index table. 

19 The GOP index table forms a major sub-section of the metadata. It is easy to see 

20 that assuming one I frame per GOP, the cumulative frame count values at I pictures also 

21 become their cumulative temporal references (referring to display order). Then, it is 

22 straightforward to calculate a PTS value for each of these I frames assuming a continuous 

23 video play-out. Finally, assuming a known uniform GOP structure, these presentation 
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time stamps of I pictures can be easily converted to decoding time stamps based on the 
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principle that the decode time instant of an anchor frame is the same as the presentation 




3 


time instant of the previous anchor frame. So, the DTS-PCR^ difference can be 
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computed in advance for each I frame of the file and consequently whatever the start 




5 


position is in a clip for play-out, the required buffer level to be build-up can be known in 




6 


advance. 
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With reference to FIG. 49, there are shown further details of the components 




8 


involved in the ingestion of an MPEG TS into a stream server computer 291 for 
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recording in the cached disk array, and for real-time splicing during real-time 




10 


transmission of an MPEG TS from the cached disk array and from the stream server 


in 


11 


computer to a destination such as one of the clients (54 in FIG. 1). The stream server 


£ 5 

iy 


12 


computer 291 is interfaced to the network (25 in FIG. 1) via a network interface board 


Pi 


13 


292. The network interlace board 292, tor example, is a DVB board, an AIM board, an 




14 


Ethernet board, a Fiber Channel board, or a Gigabit Ethernet board. The network 


111 


15 


interface board 292 performs a direct memory access upon buffers 293 in the random 


o 


16 


access memory 294 of the stream server computer 291 in order to exchange MPEG TS 




17 


data with the network (25 in FIG. 1). A software driver 295 for the network interface 




18 


board 292 initiates the direct memory access transfers. In particular, the software driver 




19 


295 hands to the network interface board 292 the RAM address range of the data in the 




20 


buffer for the DMA transfer. Real-time delivery of an MPEG TS stream from the stream 




21 


server 291 is controlled by a "house" clock signal 55. As shown in FIG. 1, the house 




22 


clock signal 55 is applied to each of the stream servers 21 and the controller servers 28, 
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the data must be delivered to ensure that any jitter is within the limit that the MPEG 
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standard imposes on the PCR time values. The PCR values must be accurate within 20 
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cycles of a 27 MHz decoder clock. Moreover, the difference between neighboring PCR 
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values in the TS is kept less that 100 msec; otherwise, the decoder clock will reset. 
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299. The write access program module 299 removes the pointer from the queue 298, and 
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then writes the data from the indicated buffer to an MPEG TS file of the file system 300. 
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The file system 300 writes the data to the cached disk array 23 in an asynchronous write 
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operation to data storage in the cached disk array. The file system maps the address 
space of each file to one or more 16 megabyte blocks of storage in the cached disk array. 
(The active one of the controller servers 28, 29 in FIG. 1 has supervisory control over 
these operations of the stream server computer 291 .) 

To perform splicing or other real-time MPEG processing during real-time 
delivery of an MPEG TS to the network (25 in FIG. 1), a read access program module 
301 invokes the file system 300 to read the MPEG TS data from an MPEG TS file in the 
cached disk array 23 in an asynchronous read operation upon data storage in the cached 
disk array, and the read access program module writes the MPEG TS data into an 
assigned one of the buffers 293. When the read access program 301 has filled the 
assigned one of the buffers 293, it places a pointer to the buffer on a FIFO buffer pointer 
queue 302. An MPEG processing program module 303 services the queue 302. Upon 
finding that the queue 302 is not empty, the module 303 removes the pointer from the 
queue 302 and accesses the buffer 293 indicated by the pointer. 

For splicing, the MPEG processing module 303 will access two consecutive 
buffers, one containing a first clip and the other containing a second clip. The splicing 
procedure modifies the first clip in its assigned buffer so that the first clip will represent 
the spliced TS. Splicing in real time requires parsing the TS stream in real time for audio 
PES packet headers, and parsing the audio PES packets in real time for the AAUs. Also 
the TS stream is parsed in real time to find the GOP header and to extract the display 
order and type (i.e., open or closed) from the GOP header. The AAUs around the splice 
point are identified as obsolete or not, the non-obsolete AAUs are reformatted and the 
obsolete AAUs are eliminated in real time. The TS stream around the splice point is 
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1 modified in real time for seamless video splicing. The time stamp offsets are computed 

2 and the spliced TS stream following the splice point has all of its time stamps and 

3 continuity counters re-stamped in real time. 

4 When the MPEG processing module 303 is finished with the splicing operation, it 

5 places the pointer to the buffer of the spliced TS into yet another FIFO buffer pointer 

6 queue 304. The queue 304 is serviced by the software driver. Upon finding that the 

7 queue 304 is not empty, the software driver 295 removes the pointer from the queue 304, 

8 and causes the network interface board to initiate a direct memory access transfer of the 

9 spliced TS from the indicated buffer 293 to the network (25 in FIG. 1). The TS data is 

10 buffered from the MPEG processing to the network interface board because the network 

1 1 interface board has priority access to the stream server RAM. (The active one of the 

12 controller servers 28, 29 in FIG. 1 has supervisory control over these operations of the 

1 3 stream server computer 29 1 .) 

14 It is also possible for the new spliced TS to be stored in the cached disk array 23, 

15 with or without concurrent transfer to the network (25 in FIG. 1 .) In this case, the 

16 software driver 295 passes the buffer pointer from the queue 304 to the queue 296. 

17 Overall, it is seen that the buffers 293 function as a kind of carousel to distribute clips and 

18 MPEG TS streams data to successive processing, storage, and stream delivery functions, 

19 and the MPEG TS streams can be easily edited and spliced in the process. 

20 The number of buffers 293 that are allocated for use in the carousel during the 

21 reading, writing, or generation of a spliced TS is a function of the bit rate of the TS. A 

22 higher bit rate requires more buffers. Each buffer, for example, has 64 kilobytes of 

23 memory, and the data rate can range from about 100 kilobits per second to 130 megabits 



H: 363314(7SC201!.DOC) 



-58- 



1 per second. The buffers smooth out variations in bit rate that are not deterministic in 

2 time. The buffer size can be much smaller than a clip, and smaller than a portion of a clip 

3 that is needed for splicing. In this case, when splicing a first stream (Si) to a second 

4 stream (S2), alternate buffers in sequence around the carousel can contain data from the 

5 same clip. For example, a first buffer may contain a next-to-last segment of the first clip, 

6 a second buffer may contain a first segment of the second clip, a third buffer may contain 

7 a last segment of the first clip, and a fourth buffer may a second segment of the second 

8 clip. 

9 The metadata computation module 297 parses the content of its assigned buffer. 

10 The parsing typically continues from one buffer to a next buffer in the carousel, for the 

1 1 usual case where each buffer has a size smaller than the duration of the TS. The parsing 

12 counts frames, builds GOP entries, calculates instantaneous bit rates and other GOP 

13 attributes, and looks for error conditions. Each GOP header is parsed for display order 

14 and type (i.e., open or closed). 

15 The MPEG processing 303 may use a number of flags. These MPEG processing 

16 flags include the following: 
17 

18 - 0x100: re-stamp time records flag 
19 

20 If this flag is set then all of the PCR and DTS/PTS records are recomputed and re- 

21 stamped so that they are continuous across splicing transitions. 
22 

23 - 0x200 : discontinuity flag 
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1 

2 If this flag is set, the discontinuity flag of the adaptation field in the TS packet headers is 

3 set following the splicing point. 
4 

5 -Ox 1 000 : rate-based padding flag 
6 

7 This bit is not used by the MPEG processing itself. If padding is necessary since the 

8 session bit-rate is greater than the clip bit-rate, the right amount of padding will be 

9 inserted in any case. However, it is used by the video service to allow appending clips 

10 having a bit-rate smaller than the session bit-rate. If it is not set, the video service allows 

1 1 only clips having the same bit-rate as the current session. 
12 

13 -0x2000: allow removal of B frames in an open GOP 
14 

15 If this flag is not set then no frames from any clip can be removed or replaced. This bit 

16 must be set only if clips are encoded with an encoder set-up to generate clips that can be 

17 spliced. 
18 

19 -0x20000: disable audio splicing flag 
20 

21 This bit when set, disables all of the audio processing around the splicing points except 

22 for the PTS and PCR re-computation. All of the audio present in the clip is played in this 

23 case. 
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1 With reference to FIG. 50, there is shown a diagram that illustrates a metered file 

2 transfer protocol (FTP). This protocol is useful for transfer of an MPEG TS stream from 

3 the video file server (20 in FIG. 1) to an application 310. The application 310, for 

4 example, is a client application on the network 25, or it could be another video file server. 

5 The application 310 initiates the metered FTP by sending a copy command to the active 

6 one of the controller servers 28, 29. The controller server sends a set bandwidth 

7 command to the stream server to set the bit rate for the metered transfer of file data 

8 between the stream server 291 and the application 310. The stream server then issues a 

9 connect message to the application to open an IP channel for the transfer of the file data. 

10 In the metered FTP protocol, the data transmission rate is controlled so that the loading 

1 1 on the stream server is deterministic. The data transmission is TCP flow controlled. For 

12 input to the stream server from the application, the stream server controls the data rate by 

13 flow-control push-back. For transmission of data from the stream server to the 

14 application, the stream server merely controls the rate at which it transmits the data. 

15 In the transmission control protocol (TCP), the stream server either opens or 

16 closes a window of time within which to receive more data. The stream server indicates 

17 to the application a certain number of buffers that are available to receive the data. In 

18 addition, the stream sever acknowledges receipt of the data. 

19 In the metered FTP protocol, time is split up into one-second intervals, and at 

20 every 1/10 of a second, the average data rate is re-computed. An adjustment is made to a 

21 data transmission interval parameter if the computed data rate deviates from the desired 

22 rate. For example, for a desired 10 kilobyte per second transmission rate, the data 

23 transmission size is set at one kilobyte, and the data transmission interval parameter is 
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1 the preloaded extra audio is replaced by audio data (i.e. audio access units) extracted 

2 from the new clip's buffer pool at the splicing time. 

3 The seamless splicing techniques described above can also be used to recover 

4 from failure conditions that may destroy or corrupt a portion of an MPEG transport 

5 stream. For example, a component of a data path in the cached disk array may fail, 

6 causing an MPEG TS from a disk drive in the cached disk array to be interrupted for a 

7 short period of time while the failure condition is diagnosed and the MPEG TS is re- 

8 routed to bypass the failed component. As shown in the flow chart of FIG. 52, the MPEG 

9 processing module may be programmed to recognize the failure (step 351) during the 

10 delivery of the MPEG TS to a client (step 352). Once this failure is detected, the MPEG 

1 1 processing module 303 can fill in this gap in the MPEG TS with null packets or freeze 

12 frames with correct PCR values (step 353). By inserting correct PCR values at less than 

13 the required minimum interval (less than 100 milliseconds), a client's decoder will not 

14 reset and can be kept in a ready state. Once delivery of the MPEG TS to the MPEG 

15 processing module is reestablished (as detected in step 354), the MPEG processing 

16 module seamlessly splices (step 355) the re-established TS (as if it were a second stream 

17 or clip) to the TS of null packets or freeze frames that it has been generating and sending 

18 to the client. Splicing could be performed in a similar fashion in the set-top decoder box 

19 of FIG. 2 or the switch of FIG. 3 to compensate for temporary interruptions in the 

20 delivery of an MPEG TS to the set-top decoder box or to the switch. 

21 In a similar fashion, the MPEG processing module in batch mode could check a 

22 clip for any damaged portions, and once a damaged portion is found, remove it by 

23 seamlessly splicing the end of the first good part of the clip to the beginning of the last 
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1 good part of the clip. Batch mode processing also would have the advantage that the 

2 audio and video buffer levels could be determined exactly by simulation, so that it would 

3 be possible to guarantee the absence of any buffer underflow or overflow at every point 

4 after the splice. Batch mode processing, with audio and video buffer simulators, could 

5 also measure the quality of spliced TS streams and determine whether or not the splices 

6 should be repaired using the more accurate simulated buffer levels. The quality 

7 measurement could also include an analysis of audio delay or skew; how many freeze 

8 frames are in the TS stream and their clustering, and an analysis of PCR jitter. It would 

9 also be very easy for the MPEG processing module to compute the audio skew and PCR 

10 jitter in real time during the real-time transmission of an MPEG TS, and to display 

1 1 continuous traces of the audio skew and PCR jitter to a system administrator. 

12 In view of the above, there has been described the preparation of metadata for 

13 splicing of an encoded digital motion video stream (such as an MPEG Transport Stream) 

14 is prepared in real time while recording at the encoding bit rate and faster than encoded 

15 bit rate for off line encoding independent of the bit rate and mechanisms for ingestion of 

16 the data stream into data storage. Preprocessing is performed during a metered file 

17 transfer protocol (FTP) and includes pseudo real-time encoding. The preprocessing 

18 includes Group of Pictures (GOP) level pre-processing of splicing In Points and results in 

19 an intimate linkage between metadata and the file system in which the video data is 

20 stored. The preferred file system enables access to metadata in parallel to writing the 

21 data on disk. The pre-processing is performed simultaneous to writing the data to the 

22 disk using a carousel type buffer mechanism. 
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